AITopics | in-distribution generalization

Collaborating Authors

in-distribution generalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

C Improving Generalization in Regression

Neural Information Processing SystemsApr-24-2026, 18:50:42 GMT

artificial intelligence, generalization, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

1626be0ab7f3d7b3c639fbfd5951bc40-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 15:16:46 GMT

c-mixup, dataset, probability, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
(8 more...)

Industry:

Health & Medicine (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

C-Mixup: ImprovingGeneralizationinRegression

Neural Information Processing SystemsFeb-7-2026, 15:16:42 GMT

Improving the generalization of deep networks is an important open challenge, particularly in domains without plentiful data.

artificial intelligence, generalization, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Layer Specialization Underlying Compositional Reasoning in Transformers

Liu, Jing

arXiv.org Artificial IntelligenceOct-21-2025

Transformers exhibit compositional reasoning on sequences not observed during training, a capability often attributed to in-context learning (ICL) and skill composition. We investigate this phenomenon using the Random Hierarchy Model (RHM), a probabilistic context-free grammar that generates sequences through recursive rule application. Models are trained on subsets of sequences and evaluated across four generalization conditions: memorization, in-distribution generalization, out-of-distribution generalization with the same rules, and cross-layer transfer. Behaviorally, performance improves systematically with task complexity and the number of in-context examples, with out-of-distribution tasks requiring substantially more examples than in-distribution scenarios. Mechanistically, we identify a progressive emergence of layer specialization during training that correlates with generalization performance. Principal component analysis and attention pattern clustering reveal that transformers develop structured, hierarchically organized representations in specialized layers. These results demonstrate that transformers develop modular, interpretable mechanisms supporting compositional reasoning, linking internal algorithmic structure to observed behavioral capabilities.

machine learning, natural language, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2510.17469

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.66)

Add feedback

Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization

Neural Information Processing SystemsMay-27-2025, 06:38:21 GMT

Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we rigorously prove that SAM learns different features more uniformly, particularly in early epochs. That is, SAM is less susceptible to simplicity bias compared to GD. We also show that examples constraining features that are learned early are separable from the rest based on the model's output.

generalization performance, in-distribution generalization, training data distribution, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Out-of-distribution generalization for learning quantum dynamics

Caro, Matthias C., Huang, Hsin-Yuan, Ezzell, Nicholas, Gibbs, Joe, Sornborger, Andrew T., Cincio, Lukasz, Coles, Patrick J., Holmes, Zoë

arXiv.org Artificial IntelligenceJul-9-2023

Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a different distribution to the training distribution. Here, we prove out-of-distribution generalization for the task of learning an unknown unitary. In particular, we show that one can learn the action of a unitary on entangled states having trained only product states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics on near term quantum hardware, and further opens up new methods for both the classical and quantum compilation of quantum circuits.

artificial intelligence, ensemble, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41467-023-39381-w

2204.10268

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

C-Mixup: Improving Generalization in Regression

Yao, Huaxiu, Wang, Yiping, Zhang, Linjun, Zou, James, Finn, Chelsea

arXiv.org Artificial IntelligenceOct-11-2022

Improving the generalization of deep networks is an important open challenge, particularly in domains without plentiful data. The mixup algorithm improves generalization by linearly interpolating a pair of examples and their corresponding labels. These interpolated examples augment the original training set. Mixup has shown promising results in various classification tasks, but systematic analysis of mixup in regression remains underexplored. Using mixup directly on regression labels can result in arbitrarily incorrect labels. In this paper, we propose a simple yet powerful algorithm, C-Mixup, to improve generalization on regression tasks. In contrast with vanilla mixup, which picks training examples for mixing with uniform probability, C-Mixup adjusts the sampling probability based on the similarity of the labels. Our theoretical analysis confirms that C-Mixup with label similarity obtains a smaller mean square error in supervised regression and meta-regression than vanilla mixup and using feature similarity. Another benefit of C-Mixup is that it can improve out-of-distribution robustness, where the test distribution is different from the training distribution. By selectively interpolating examples with similar labels, it mitigates the effects of domain-associated information and yields domain-invariant representations. We evaluate C-Mixup on eleven datasets, ranging from tabular to video data. Compared to the best prior approach, C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively. Code is released at https://github.com/huaxiuyao/C-Mixup.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.05775

Country:

Africa (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Oceania > New Zealand (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback